https://github.com/csglab/CRIES
Step 1. Creating GTF annotation files
Intron and exon annotations for gencode.v34 made within my notebook named build-genome.ipynb.
Step 2. Mapping reads
STAR
Step 3. Counting reads that map to intronic or exonic segments of each gene
featureCounts
http://bioinf.wehi.edu.au/featureCounts/
Step 4. Normalization - REMBRANDTS
REMBRANDTS is a package for analysis of RNA-seq data across multiple samples in order to obtain unbiased estimates of differential mRNA stability. It uses DESeq to obtain estimates of differential pre-mRNA and mature mRNA abundance across samples, and then estimates a gene-specific bias function that is then subtracted from ĪexonāĪintron to provide unbiased differential mRNA stability measures.
Limma is originally design for micro-array experiments which is mainly doing same task as DESeq2. Comparing to RNA expression analysis, RNA stability may have negetive values; DESeq2 does not support negetive values but Limma does. Therefore, we decided to use Limma package instead of DESeq2 for differential analysis.
I've learned how to use limma from this DataCamp course | differential-expression-analysis-with-limma-in-r. However, I found these links useful while browsing.
%%R
volcanoplot(fit2,highlight = 6, names = fit2$genes[,'name'])
Box plot generator for top hits ..
%%R
temp = fData(eset) %>% rownames_to_column('row')
gene = temp$row[temp$name == 'ABHD2']
# Create a boxplot of a given gene
boxplot(exprs(eset)[gene, ] ~ pData(eset)[, "cond"],main = fData(eset)[gene, "name"])
boxplot(exprs(eset)[gene, ] ~ pData(eset)[, "time"],main = fData(eset)[gene, "name"])
%%R
temp = fData(eset) %>% rownames_to_column('row')
gene = temp$row[temp$name == 'SORD']
# Create a boxplot of a given gene
boxplot(exprs(eset)[gene, ] ~ pData(eset)[, "cond"],main = fData(eset)[gene, "name"])
boxplot(exprs(eset)[gene, ] ~ pData(eset)[, "time"],main = fData(eset)[gene, "name"])
Just like the RNA-seq experssion analysis, sample 72h_treated_rep2 comes up as an outlier. Removing that from the analysis give us a better representation. Therfore, we can see that treated samples at 6h cluster with the non-treated samples which suggest that 6 hours treatment with the drug is not as effective as 72h and 120h. Although, we will check the variant genes in this time-point in the following statistical analysis.
%%R
# Plot principal components labeled by treatment
filter = !eset@phenoData@data$sample_id == '72h_treated_rep2'
plotMDS(eset[,filter], labels = pData(eset[,filter])[, "time"], col=col_by_cond[filter], gene.selection = "common")
%%R
# Quantile normalize
exprs(eset_norm) <- normalizeBetweenArrays(exprs(eset_norm))
plotDensities(eset_norm, legend = FALSE)
Let's write normalized RNA-Stabilities into a file.
%%R
ncu <- exprs(eset_norm) %>% replace_na(0)
write.table(ncu,'stbl_norm_log_quantile.txt', sep="\t", quote=FALSE, col.names=TRUE)
ncu %>% summary
%%R
# Obtain the summary statistics
stats_120h <- topTable(fit2, coef = "log2FC_120h", number = nrow(fit2),
sort.by = "none")
# Create histograms of the p-values for each contrast
hist(stats_120h[, "P.Value"])
%%R
# Obtain the summary statistics
stats_6h <- topTable(fit2, coef = "log2FC_6h", number = nrow(fit2),
sort.by = "none")
# Create histograms of the p-values for each contrast
hist(stats_6h[, "P.Value"])